Then user browses the web pages , selects interested web page as sample page and create conceptual schema based on the understanding of content of the sample page . thirdly , the user marks the interested information blocks in sample page and build up the correspondence between the interested information blocks of sample page and fields of conceptual schema , and at the meanwhile system passes the correspondence to learning module to form extraction rules and stores them into rule database . fourthly , extraction module extracts information from the similar - structured pages by using the extraction ailes and sends the extracted results to classified cache database for users " further query 該系統(tǒng)首先將web信息按模式和風(fēng)格劃分為不同領(lǐng)域,建立領(lǐng)域知識(shí)庫,以便對(duì)用戶的查詢進(jìn)行導(dǎo)航并縮小查詢范圍;然后利用現(xiàn)有的搜索引擎實(shí)現(xiàn)關(guān)鍵詞查詢,利用瀏覽功能尋找感興趣的網(wǎng)頁,對(duì)找到的網(wǎng)頁根據(jù)其內(nèi)容建立自定義的用戶概念模式;利用概念模式對(duì)網(wǎng)頁進(jìn)行標(biāo)記,使概念模式中的字段與網(wǎng)頁中的信息塊對(duì)應(yīng)起來,建立對(duì)應(yīng)關(guān)系并傳給學(xué)習(xí)模塊以形成抽取規(guī)則并存入規(guī)則庫;抽取模塊根據(jù)規(guī)則庫中的抽取規(guī)則進(jìn)行信息抽取并將抽取到的信息按用戶定義的概念模式形成記錄,按領(lǐng)域分類存儲(chǔ)于cache庫中以備用戶查詢;最后,用戶利用查詢功能實(shí)現(xiàn)個(gè)性化查詢。